Skip to content

Added initial implementation of kosli snapshot cloud-run server/#4986#833

Open
ToreMerkely wants to merge 11 commits intomainfrom
4986-google-cloud-run-1
Open

Added initial implementation of kosli snapshot cloud-run server/#4986#833
ToreMerkely wants to merge 11 commits intomainfrom
4986-google-cloud-run-1

Conversation

@ToreMerkely
Copy link
Copy Markdown
Contributor

  • feat(snapshot): add hidden cloud-run skeleton command (#4986)
  • feat(cloudrun): add internal/cloudrun package wrapping the Cloud Run Admin API (#4986)
  • feat(snapshot): wire cloud-run command end-to-end via internal/cloudrun (#4986)
  • feat(snapshot): add service filtering flags to cloud-run command (#4986)
  • feat(cloudrun): friendly auth and not-found error messages (#4986)
  • fix(cloudrun): align snapshot payload with server's snapshot-examples doc (#4986)

@ToreMerkely ToreMerkely enabled auto-merge (squash) April 28, 2026 12:41
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 28, 2026

Claude finished @ToreMerkely's task in 2m 19s —— View job


PR Review: kosli snapshot cloud-run

  • Phase 1: Static analysis (build, vet, mod tidy)
  • Phase 2: Code review
  • Phase 3: Post review findings

Static Analysis

All three checks pass cleanly — go build ./..., go vet ./..., and go mod tidy produce no errors or diff.


Code Review Summary

This is a well-structured addition. The vertical-slice approach is clear, the internal/cloudrun package is cleanly separated from the command layer, and the test coverage is thorough. A few observations:

One inline suggestion posted:

  • Close() drops the second error (internal/cloudrun/cloudrun.go:69-80) — if both gRPC Close() calls fail, only the first error is returned. errors.Join (available since Go 1.20, and you're on 1.25) would preserve both and simplify the function. Low impact for a CLI, but trivial to improve. See inline comment.

Observations (no action required):

  • The previous review round identified the gRPC resource leak and the services[:0] readability issue — both were fixed in 88c82026. The Close() method, partial-failure cleanup in New(), and io.Closer type-assert defer all look correct.

  • The LongDesc still says the command is "currently a hidden, in-development command" — that's accurate for now but will need updating in Slice 8 when the command is unhidden. Already tracked in the handover doc.

  • The RevisionData struct mixes camelCase (revisionName) and snake_case (service_name) JSON tags. The handover doc explains this is intentional to match the ECS wire format. Not a bug — just noting for future readers.

  • Error classification (Classify) is applied at the command boundary, not inside ListServices — good design choice that avoids double-wrapping and keeps the package testable.

What's done well

  • Test coverage is excellent. The internal/cloudrun package has 11 unit tests covering the key edge cases (traffic splits, LATEST resolution, dedup, zero-percent revisions, tag-pinned images, no-traffic services, multi-container, multiple services). The command tests cover arg validation, flag mutual exclusion (4 pairs), filtering (include/exclude by name and regex), the full server roundtrip, and the auth error UX. This is thorough.

  • Clean package boundary. internal/cloudrun owns GCP knowledge (API calls, protobuf mapping, error classification) while the command layer owns Kosli concerns (flags, filtering, HTTP submission). The apiClient interface enables testing without GCP credentials.

  • Follows established patterns. The command structure (newSnapshotCloudRunCmd, run, cmdTestCase table, suite.Suite) mirrors the existing ECS command closely. Filter flags use the shared filters.ResourceFilterOptions. The payload shape follows out-snapshot-examples.txt.

  • The handover doc (docs/handover/2026-04-28-4986-google-cloud-run-1.md) is unusually detailed and well-maintained — decisions, constraints, and the slice plan are all documented with rationale. Makes the PR self-explanatory.

Verdict: This looks good to merge. The errors.Join suggestion is a minor improvement but not blocking.

Comment thread internal/cloudrun/cloudrun.go
Comment thread cmd/kosli/snapshotCloudRun.go Outdated
Comment thread internal/cloudrun/cloudrun.go
Comment thread cmd/kosli/snapshotCloudRun.go
Comment thread internal/cloudrun/cloudrun.go
Comment thread internal/cloudrun/payload.go
ToreMerkely and others added 6 commits April 28, 2026 18:25
Registers `kosli snapshot cloud-run ENVIRONMENT-NAME` as a hidden,
in-development command for Google Cloud Run. The command takes
required `--project` and `--region` flags, forces dry-run mode, and
prints a placeholder line — no GCP API calls or HTTP requests yet.
This is the first slice of the Cloud Run feature; subsequent slices
will add the GCP client wrapper, the end-to-end happy path, filtering,
auth UX, and finally unhide the command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Admin API (#4986)

Slice 2 of the snapshot cloud-run feature. Adds a small, unit-tested
wrapper around cloud.google.com/go/run/apiv2 that lists services in a
GCP project + region and, for each service, returns one Revision per
entry in the service's traffic configuration (any percent including
0%). TrafficTargetAllocationType_LATEST resolves via the service's
LatestReadyRevision, and revisions referenced more than once are
deduped. Digest extraction mirrors the ECS fallback in
internal/aws/aws.go: use a @sha256: substring when present, else leave
the digest empty rather than calling a registry. Production code
reaches GCP through an unexported apiClient seam so tests substitute a
fake without touching ADC.

No command wiring yet; that's the next slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…un (#4986)

Slice 3 of the snapshot cloud-run feature. The command now calls
cloudrun.New + ListServices, flattens the result into an EnvRequest
payload, and submits a PUT to /report/cloud-run via kosliClient.Do.
The PreRunE-forced dry-run keeps the request from actually leaving the
client, so the (still-missing) server-side endpoint is not on the path
yet. Each artifact carries the GCP project and region in addition to
service_name, so revision rows are self-describing — a small extension
of the EcsEnvRequest shape.

The command depends on a local cloudRunLister interface and a package-
level newCloudRunClient variable, letting tests swap in a stub without
touching ADC; integration was sanity-checked against the real
hello-world-cli-demo project and produced the expected
digest-pinned artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 4 of the snapshot cloud-run feature. Adds --services,
--services-regex, --exclude, and --exclude-regex, mirroring the ECS
service filtering shape and reusing the existing
filters.ResourceFilterOptions struct. PreRunE rejects the four
include/exclude mutex pairs.

Filtering is applied in the command after cloudrun.ListServices
returns. Services excluded by name still incur their per-revision
API round-trips; pushing the filter into the GCP wrapper to skip
those calls is a tractable follow-up if the round-trip cost becomes
a bottleneck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 6 of the snapshot cloud-run feature. cloudrun.Classify maps
gRPC Unauthenticated / PermissionDenied / NotFound responses into
actionable messages and preserves the underlying error via %w; other
codes pass through. The Unauthenticated message names all three
credential sources (env var, gcloud command, GCE/GKE metadata server
or Workload Identity) because the production deployment is a cluster
cron job, not a local-dev gcloud session.

Classification lives in internal/cloudrun (the package owns GCP
knowledge) but is applied at the command boundary, not inside
Client.ListServices. Doing it inside the Client would double-wrap
real-call errors when the command also classified them, and it would
bypass the stubbed test path entirely. Calling Classify once at the
command boundary covers both real and stub error sources.

cloudrun.New now wraps construction errors with a generic "GCP client
setup failed" prefix; the cluster case rarely fails here, and the SDK
message is preserved for diagnosis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… doc (#4986)

Three changes brings the payload into line with the conventions
documented in the server repo's out-snapshot-examples.txt:

- Added the top-level "type": "cloud-run" literal that every other
  env-type report ships explicitly. Without it, the (still-to-come)
  CloudRunReport model on the server would only accept the URL
  default; with it, the request is unambiguous.
- Renamed the per-artifact "service_name" to "serviceName" (camelCase)
  matching the doc's K8S/ECS examples. The existing CLI's ECS code
  uses snake_case, but a new type is better off following the doc.
- Dropped the per-artifact "project" and "region" fields. The doc
  notes extra="forbid" on every Pydantic model, so unilaterally
  picking custom field names risks 422 once the server defines its
  CloudRunReport. Project + region are already in the URL and flags,
  mirroring how ECS reports don't carry account/region per artifact.

Verified end-to-end against the hello-world-cli-demo project; payload
now matches the doc shape exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mbevc1 mbevc1 force-pushed the 4986-google-cloud-run-1 branch from 48c4c0e to fa367b3 Compare April 28, 2026 17:25
Comment thread internal/cloudrun/cloudrun.go
Comment thread cmd/kosli/snapshotCloudRun.go
Comment thread internal/cloudrun/cloudrun.go
Comment thread internal/cloudrun/payload.go
Comment thread internal/cloudrun/cloudrun.go
@ToreMerkely ToreMerkely disabled auto-merge April 29, 2026 06:54
ToreMerkely and others added 3 commits April 29, 2026 11:09
…(#4986)

The serviceName JSON tag was originally chosen to match the server's
out-snapshot-examples doc, but the existing CLI's ECS code sends
cluster_name/service_name on the wire and is in production today;
Azure does the same with app_name/app_kind/digests_source. The doc
disagrees with reality, so align with reality. The wire convention
across types now is: unique-ID field camelCase (revisionName, taskArn,
podName), grouping field snake_case (service_name, cluster_name,
app_name), digests + creationTimestamp universal.

Also numbered the cmdTestCase entries in TestSnapshotCloudRunCmd
(01-09) to match the convention from attestSonar_test.go, which makes
individual cases easier to spot when scanning suite output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review feedback. Two issues:

cloudrun.New opened gRPC services and revisions clients but never
closed them. For a one-shot CLI invocation the OS reclaims the
sockets, but the gRPC library logs noisy "clientConn switching
balancer" warnings on stderr and the keepalive goroutines outlive the
work. There was also a partial-failure leak: if NewRevisionsClient
returned an error after services succeeded, services was abandoned.

Adds Client.Close() releasing both connections (with the second
attempted even when the first errors) and a cleanup of services in
the partial-failure path. The command defers Close via an io.Closer
type-assert, so the test stub does not need to implement it.

Also replaced `filtered := services[:0]` (correct in-place reuse but
a subtle idiom — relies on range copying each element before the
overwrite via append) with `make([]cloudrun.Service, 0, len(services))`
for clarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
// RevisionData represents one Cloud Run revision in the snapshot payload.
// One artifact is emitted per revision in each service's traffic configuration.
type RevisionData struct {
RevisionName string `json:"revisionName"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (naming consistency): revisionName (camelCase) and service_name (snake_case) in the same struct is unusual. I see from the handover doc and the ECS precedent (taskArn + service_name) that this is a deliberate choice to match the existing wire format. Just flagging for anyone reading the code fresh — it looks like a typo but is intentional.

No change needed.

Comment on lines +66 to +69
// Close releases the underlying gRPC connections. Safe to call on a Client
// constructed with a fake apiClient (returns nil). Returns the first error
// from closing either client; the second is always attempted.
func (c *Client) Close() error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Close() silently drops the second error.

If both services.Close() and revisions.Close() fail, only sErr is returned — rErr is silently dropped. For a CLI that's about to exit this is low-impact, but errors.Join (Go 1.20+) would preserve both:

Suggested change
// Close releases the underlying gRPC connections. Safe to call on a Client
// constructed with a fake apiClient (returns nil). Returns the first error
// from closing either client; the second is always attempted.
func (c *Client) Close() error {
func (c *Client) Close() error {
g, ok := c.api.(*gcpAPI)
if !ok {
return nil
}
return errors.Join(g.services.Close(), g.revisions.Close())
}

This also simplifies the function. Low priority — the current version works fine for a short-lived CLI process.

Comment thread cmd/kosli/snapshotCloudRun.go Outdated

const snapshotCloudRunShortDesc = `Report a snapshot of running services in a Google Cloud Run project and region to Kosli. `
const snapshotCloudRunLongDesc = snapshotCloudRunShortDesc + `
Currently a hidden, in-development command — it always runs in dry-run mode regardless of the --dry-run flag.`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: LongDesc still says "it always runs in dry-run mode".

This is correct for now since the command is hidden and DryRun is forced to true in PreRunE. Just a reminder that Slice 7 (unhide + lift dry-run) will need to update this string. No change needed now.

ToreMerkely and others added 2 commits April 29, 2026 13:28
Lifts the forced dry-run lock that's been on since Slice 1, so the
command actually PUTs the snapshot to /report/cloud-run. Adds the
ECS-style success log "[N] revisions were reported to environment X"
when the request succeeds outside dry-run.

Test changes:
- SetupTest now creates the env with type "cloud-run" via CreateEnv,
  the same way ECS/K8S tests do. Confirms the local server's
  CloudRunReport endpoint accepts the new env-type literal.
- The dry-run-asserting tests (cmdTestCase 05 and the four
  TestSnapshotCloudRunFilter_* cases) take explicit --dry-run since
  the global default is no longer forced.
- New TestSnapshotCloudRunCmd_HappyPathReportsToServer exercises the
  full CLI → local Kosli server roundtrip with the GCP client
  stubbed, asserting the success log line.
- Stub fixtures use 64-char hex digests because the server's
  CloudRunReport validates `^[a-f0-9]{64}$` per digest value.

Hidden: true stays for now; unhiding + CLI reference docs land in
the next slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +69 to +80
func (c *Client) Close() error {
g, ok := c.api.(*gcpAPI)
if !ok {
return nil
}
sErr := g.services.Close()
rErr := g.revisions.Close()
if sErr != nil {
return sErr
}
return rErr
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Close() silently drops the second error.

As noted in the previous review round, if both services.Close() and revisions.Close() fail, rErr is silently dropped. Since you're on Go 1.25, errors.Join is available and simplifies this:

Suggested change
func (c *Client) Close() error {
g, ok := c.api.(*gcpAPI)
if !ok {
return nil
}
sErr := g.services.Close()
rErr := g.revisions.Close()
if sErr != nil {
return sErr
}
return rErr
}
func (c *Client) Close() error {
g, ok := c.api.(*gcpAPI)
if !ok {
return nil
}
return errors.Join(g.services.Close(), g.revisions.Close())
}

(Requires adding "errors" to the import block.)

Low impact for a short-lived CLI, but it's a one-line improvement and preserves both errors for diagnostics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants